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Video transmission will become a important part of 
future multimedia communication because of dramatically 
increasing user demand for video, and rapid evolution of 
coding algorithm and VLSI technology. Video transmission 
will be a part of the broadband integrated services digital 
network (B-ISDN) . Asynchronous transfer mode (ATM) is a 
viable candidate for implementation of B-ISDN due to its 
inherent flexibility, service independency, and high 
performance. According to the characteristics of ATM, the 
information has to be coded into discrete cells which 
travel independently in the packet-switching network. 

In this thesis, a practical realization of an ATM video 
codec called Mixture Block Coding with Progressive 
Transmission (MBCPT) is presented. This variable bit rate 
coding algorithm shows how a constant quality performance 
can be obtained according to user demand. Interactions 
between codec and network are emphasized including 
packetization, service synchronization, flow control and 
error recovery. Finally, some simulation results based on 
MBCPT coding with error recovery are presented. 
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Chapter 1 Introduction 


Communicating images has traditionally been the 
specialty of postal services, and of late, the air freight 
industry. Early electronic means of communicating 
photographs used wire services that were essentially 
forerunners of facsimile. Fax is now moving toward plain 
paper and eventually color. Fax networks are also becoming 
increasingly popular for timely distribution of routine 
communications. But fax is only the tip of an imaging 
communications iceberg. On the cusp of an explosion in 
integrated imaging, one effect will be a quantum leap in 
the demands on networks to move all these images. As color 
graphics and video become more prevalent, networking 
capabilities will have to increase further[l]. 

Due to the rapidly evolving field of image processing 
and networking, video information is promising to be an 
important part of tomorrow's telecommunication system. Up 
to now, telecommunication traffic has been mainly 
transported over circuit-switched networks. Since 
packet-switched networks are likely to dominate the 
communications world in the near future, it is necessary to 
develop techniques for video transmission over such 
networks . 

The classic approach in circuit switching is to provide 
a "dedicated path", thus reserving a continuous bandwidth 
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capacity in advance. Any unused bandwidth capacity on the 
allocated circuit with circuit-switching is therefore 
wasted. Rapidly varying frequency signals, like video 
signals, require too much bandwidth to be accommodated by a 
standard circuit-switching channel. With a certain amount 
of capacity assigned to a given source, if the output rate 
of that source is larger than the channel capacity, quality 
will be degraded. If the generating rate is less than the 
limit, the excess channel capacity is wasted. Another 
point that strongly favors packet-switched networks is the 
possibility that the integration of services in a network 
will be facilitated if all of the signals are separated 
into packets with the same format. 

Some coding schemes which support the packet video idea 
have been explored. Verbiest and Pinnoo proposed a 
DPCM-based system which consists of an intrafield / 
interframe predictor, a nonlinear quantizer, and a variable 
length coder[2]. Their codec obtains stable picture quality 
by switching between three different coding modes: 
intrafield DPCM, interframe DPCM, and no replenishment. 
Ghanbari has simulated a two-layer conditional 
replenishment codec with a first layer based on a hybrid 
DCT-DPCM and a second layer using DPCM[3]. This scheme 
generates two type of packets: "guaranteed packets" 
contains vital information and "enhancement packets" 
contains "add-on" information. Darragh and Baker presented 
a sub-band codec which attains user-prescribed fidelity by 
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allowing the encoder's compression rate to vary [4]. The 
codec's design is based on an algorithm that allocates 
distortion among the sub-bands to minimize channel entropy. 
Kishino et al. describe a layered coding technique using 
discrete cosine transform coding, which is suitable for 
packet loss compensation^] . Karlsson and Vetterli 
presented a sub-band coder using DPCM with a nonuniform 
quantizer followed by run-length coding for baseband 
information and PCM with run-length coding for the 
remaining information [ 6] . In this thesis, a different 
coding scheme called MBCPT is investigated. Unlike the 
methods mentioned above, MBCPT doesn't use decimation and 
interpolation filters to separate the signals into 
sub-bands. However it retains the desirable aspects of 
sub-band coding by using variable blocksize transform 
coding. In Chapter 2, some of the important characteristics 
and requirements about packet video are discussed. In 
Chapter 3, some details of image data compression, scalar 
quantization, vector quantization and transform coding are 
introduced and the coding scheme, called Mixture Block 
Coding with Progressive Transmission, is discussed. In 
Chapter 4, a network simulator used in this thesis is 
introduced. In Chapter 5, the simulation result is 
discussed. Finally, in Chapter 6 a summary of this thesis 
is presented. 



Chapter 2 Packet Video 


In this chapter some background for packet video is 
presented, and some characteristics and requirements of 
packet video are demonstrated. 

2.1 Broadband Integrated Service Digital Network 

The demand for various services, such as telemetry, 
terminal and computer connections, voice communications, 
and full-motion high-resolution video, and the wide range 
of bit rates and holding times they represent, provide an 
impetus for building a Broadband Integrated Service Digital 
Network (B-ISDN) . B-ISDN is a projected worldwide public 
telecommunications network that will service a wide range 
of user needs. Furthermore, the continuing advances in the 
technology of optical fiber transmission and integrated 
circuit fabrication have been the driving forces to realize 
the B-ISDN. 

The idea of B-ISDN is to build a complete end-to-end 
switched digital telecommunication network with broadband 
channels. Still to be precisely defined by 
CCITT( International Telegraph and Telephone Consultative 
Committee) , with fiber transmission, H4 has an access rate 
of about 135 Mbps. A user gains access to the B-ISDN by 
means of a local interface to a "digital pipe" of a certain 
bit rate. At any given point in time, the pipe to the 
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user's premises has a fixed capacity, but the traffic on 
the pipe may be a variable mix up to the capacity limit. 
Thus a user may access circuit-switched and packet-switched 
services, as well as other services, in a dynamic mix of 
signal types and bit rates. 

The principal benefits to the user can be expressed in 
terms of cost savings and flexibility. The integrated 
services means that the user does not have to buy multiple 
services to meet multiple needs. Further, the user needs to 
bear the expense of just a single access line to these 
multiple services. 

The B-ISDN can offer a variety of services, including 
existing voice and data transmission as well as: 

* Facsimile: services for the transmission and 
reproduction of graphics, handwritten, and printed 
material . 

* Teletext: service that enables the subscriber terminals 
to exchange correspondence. 

* Video: video conferencing, picturephone, DTV, HDTV. 

2.2 Video Transmission over Packet-Switched Networks 

Packet-switched networks have the unique 
characteristics of dynamic bandwidth allocation for 
transmission and switching resources and the elimination of 
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channel structure [7 ] . It acquires and releases bandwidth as 
it is needed. Because video signals vary greatly in 
bandwidth requirement, it is attractive to utilize a 
packet-switched network for video coded signals. Allowing 
the transmission rate to vary, video coding, based on 
packet transmission, permits the possibility of keeping the 
picture quality constant, implementing "bandwidth on 
demand". Summarizing the above, there are three main merits 
when transmitting video packets over a packet switching 
network: 

(1) Improved and consistent image quality: if video 
signals are transmitted over fixed-rate circuits, 
there is a need to keep the coded bit rate constant, 
resulting in image degradation when accompanying 
rapid motion. 

(2) Multimedia integration: as mentioned in section 2.1, 
integrated broadband services can be provided using 
unified protocols. 

(3) Improved transmission efficiency: using variable 
bit-rate coding and channel sharing among multiple 
video sources, scenes can be transmitted without 
distortion if other sources, at the same time, are 
without rapid motion. 

But it has the following drawbacks: 

(1) The time taken to transmit a packet of data may 
change from time to time. 
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(2) Packets of data may arrive very late or even get 
lost. 

(3) Headers of packets may be changed because of errors 
and delivered to the wrong receiver. 

It has to be emphasized that the delay effect can reach 
very high levels if there are a lot of users accessing the 
network. Under many conditions, the loss of packet or 
erroneous receipt of other packets may seriously damage the 
quality of the image. Otherwise, because of the strong 
interaction between the coding algorithm and the network on 
which it is applied, a new video coding approach is 
required. 

2.3 Interaction between Signal Processing and Networking 

Video transmission over a packet-switched network, or 
"packet video" for short, poses a general problem: a signal 
with high and greatly varying rate has to be transmitted in 
a constrained period. When the signals transmitted in the 
network are nonstationary and circuit-switching is applied, 
a buffer between the coder and the channel is needed to 
smooth out the varying rate. If the amount of data in the 
buffer exceeds a certain threshold, the encoder is 
instructed to switch into a coding mode that has a lower 
rate but worse quality to avoid buffer overflow. 


In packet-switched networks, Asynchronous Time Division 
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Multiplexing (ATDM) can efficiently absorb temporal 
variations of the bit-rate of individual sources by 
smoothing out the aggregate of several independent streams 
in common network buffers. 

It is a difficult resource allocation and control 

problem to deliver packets in a limited time and provide a 
real time service, especially when the source generates a 
high and greatly varying rate. In packet-switching 
networks, packet losses are inevitable but they yield a 
better utilization of channel capacity. The video coder 
will require different channel capacity over time but the 
network will provide a channel whose capacity changes 

depending on the traffic in the network. 

There are some interactions between the coder and the 

network which we have to consider and which become a part 

of specifications when we design the coder: 

(1) Adaptability of the coding scheme: The video source 
we are dealing with has a varying information rate. 
So it is expected that the encoder can generate 
different bit rates by removing the redundancy. When 
the video is still, there is no need to transmit 
anything. 

(2) Insensitivity to error: The coding scheme has to be 
robust to the packet loss so that the quality of the 
image is never seriously damaged. Remember that 
retransmission is impossible because of the tight 
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timing requirement. 

(3) Resynchronization of the video: Because of the 

varying packet-generating rate and the lack of a 
common clock between the coder and the decoder, we 
have to find a way to reconstruct the received data 
synchronous to the display terminal. 

(4) Control of coding rate: Sensing the heavy traffic in 
the network, the coding scheme is required to adjust 
the coding rate by itself. In the case of a congested 
network, the coder is switched to another mode which 
generates fewer bits while degrading image quality. 

(5) Parallel architecture: The coder can be implemented 
in parallel. That means we can run the coding 
procedure at the lower rate in many parallel streams. 

In the next chapter, we investigate a coding scheme to 
see if it satisfies the above requirements. 


Chapter 3 Image Data Compression and Mixture Block 
Coding with Progressive Transmission 


In this chapter, we introduce the basic concepts of 
image data compression. We also investigate a coding 
algorithm called Mixture Block Coding with Progressive 
Transmission (MBCPT) . 

3.1 Image Data Compression 

Image data compression is a technique used to minimize 
the number of bits for representing an image. Typical 
television images have spatial resolution of approximately 
512 x 512 pixels per frame. At 8 bits per pixel per color 
channel and 30 frames per second, this image raw data rate 
is about 1.8 x 10 8 bits/s. The large channel capacity and 
memory requirement for digital image transmission makes 
image data compression desirable. 

There are two categories for image data compression, 
one is lossless coding which can recover the original image 
without any loss. The need for perfect recovery limits the 
compression rate that can be achieved. For larger 
compression rates, a second kind of coding scheme called 
lossy coding is applied. Lossy coding relies on many-to-one 
mappings to get a desired rate which is less than the 
source entropy. 
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There are two main ways to do lossy image data 
compression also. The first method, which is called 
predictive coding, exploits the redundancy in the data. 
Because an image is a highly correlated source, there is a 
lot of predictability, called redundancy, in the image. 
Techniques such as delta modulation and differential pulse 
code modulation fall into this group. The second method, 
called transform coding, transforms the given image into 
another array such that a large amount of the information 
is packed into a small number of samples. A more detailed 
discussion is provided in Section 3.3. 

The entropy of an image source with L possible 
independent symbols with probabilities p^, i=0, . . . ,L-1, is 
defined as 


H ■ - Z Pilog 2 Pi bits per symbol ( 1 ) 

In the simulated image used in the thesis, L equals 
256. According to Shannon's noiseless coding theorem, it is 
possible to losslessly code a source with an entropy of H 
bits per symbol using H+e bits per symbol, where e is an 
arbitrarily small positive quantity. In this case, the 
compression rate of lossless coding is defined by 

average bit rate of the original raw data (B) 


average bit rate of the encoded data (C) 


( 2 ) 
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In lossy coding, C can be much smaller than B. 

3.2 Transform Coding and DCT 

A variety of coding approaches have been developed for 
image compression. Some of the more promising involve 
segmenting the image into small sub images before coding. 
Specifically, the original image is divided into subimages 
which are usually of equal size. Then, each subimage is 
coded independently of the others. To reproduce the full 
image, the separate subimage blocks are reassembled by the 
decoder. The purpose of segmenting the image is to exploit 
the image’s local characteristics and to simplify hardware 
implementation of the coding algorithm. Transform coding is 
a prime example of a coding technique involving image 
segmentation [8] . In this section, the characteristics of 
transform coding are introduced and we investigate one 
important transform called the discrete cosine transform 
used in MBCPT. 

3.2.1 Transform coding 

Block coding, another name for transform coding, 
transforms a block of data into a set of transform 
coefficients and quantizes each coefficient independently. 
An image is divided into equal size blocks. The size of 
these blocks is limited by processing and storage ability. 
For an M x N image, if an m x n transform is applied, the 
image will be divided into MN/mn blocks. The main storage 
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space for doing the transform is reduced by a factor of 
MN/mn. Meanwhile, the number of operations will be reduced 
by a factor of log 2 (MN)/log 2 (mn) . That comes from two 
dimension transform with 0(Nlog 2 N) operations via an 
N -point FFT. 

The aim of the transformation is to convert 
statistically dependent picture elements (pixels) into a 
set of essentially independent transform coefficients, 
preferably packing most of the signal energy (or 
information) into a minimum number of coefficients [9 ] . Bit 
allocation is another problem when designing a transform 
coder. If a coefficient contains a lot of energy, the 
absolute value is large, and more bits will be assigned to 
it. On the other hand, a coefficient with little energy 
will be represented with fewer bits, even none. There are 
two approaches used for bit allocation. In the first 
approach, only a predetermined set of transformed 
coefficients are transmitted. This approach is called zonal 
coding with the zone covering the set of coefficients with 
the largest variances. The second approach is threshold 
' coding. In this approach those coefficients with amplitude 
greater than a predetermined threshold are coded. In MBCPT , 
zonal coding is used. 

Asymptotically DPCM and Transform coding have the same 
performance. However, under practical constraints, 
transform coding is a much more powerful tool than 
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predictive coding. It can get the relatively higher 
compression rates and distributes the error coming from 
quantization or channel over the entire image. If 
predictive coding is used, the visual degradation because 
of error will appear locally. 

3.2.2 Discrete Cosine Transform (DCT) 

Most unitary transforms pack a large fraction of the 
average energy of the image into a relatively few 
components of the transform coefficients. Since the total 
energy is preserved, this means many of the transform 
coefficients will contain very little energy. In terms of 
energy compaction and decorrelation, the Karhunen Loeve 
transform is optimum. But the Karhunen Loeve transform 
depends on the statistics of the image and, in general, the 
basis vectors have to be recomputed for different images. 
After the transform matrix has been computed, the transform 
itself requires a large number of computations. The 
discrete cosine transform is a nice substitute in highly 
correlated image transformation because it has excellent 
energy compaction and fast implementations [ 10 ] . 

The discrete cosine transform consists of a set of 
basis vectors that are sampled cosine functions. The 
transform matrix C = (c(k,n)} may be written 
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, k = 0, 0 < n < N-l (3) 

f (2n+l)k 

, 1 < k < N-l, 0 < n < N-l 

2N 

The two-dimensional DCT may be defined as 

F(u,v)«C[f (x,y)]C* (4) 

and the inverse transform 

f (x,y)«C' [F(u,v) ]C (5) 

As mentioned above, DCT is a fast transform. By the 
fast algorithm developed by Chen et al.[ll], an N x N image 
DCT needs only 2N 2 log 2 N-2N 2 +8N real multiplications and 
3N 2 log 2 (N/2) +4N real additions. Because zonal coding is 
used in MBCPT and only some of the coefficients need to be 
calculated, the operations can be reduced further and the 
real time processor can be practically implemented. 

3.3 Quantization 

Quantization is the next step after sampling in image 
digitization. A quantizer maps a continuous variable u into 
a discrete variable u', which is a value from a finite 

set{rj,r 2 , ,r n }. For most image transfora, the dc 

coefficient is positive because the gray level is usually 
nonnegative. The ac coefficients have a zero mean and a 


c(k,n) 


1 

2 

N^ 


cos 
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distribution very much like the Laplacian model. In the 
following, the two specific quantizers used in this thesis 
are discussed. 

3.3.1 Scalar Quantizer 

A scalar quantizer is an one-dimension quantizer which 
maps intervals of a line into points. The average 
distortion for a scalar quantizer is 

1 n f a i+i 

d = • E 1 P(x)-d(x,yi) dx (6) 

n i=l J a A 

where n is the number of codebook elements and (ai,a^ +1 ) is 
the i-th interval containing element y^. 

An optimal Laplacian quantizer is used in this thesis 
which is developed with MAX'S optimization theory for 
minimum distortion. 

3.3.2 Vector Quantizer 

Vector Quantization (VQ) has been widely used in 
low-bit-rate compression. It is a generalization of scalar 
quantization, and is one step closer to the optimum, as 
given by Shannon's rate distortion theory [7]. In the image 
coding area, VQ is a new but promising technique for video 
compression. 
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There are two steps involved in the type of Vector 
Quantization used in this study. First, a codebook is 
generated from a large set of training vectors which should 
be as large and as varied as possible in order to 

accurately predict future vectors. The size of the codebook 
determines the bit rate of the vector quantizer. Second, 
the codebook is downloaded to both the transmitter and 

receiver. When the vector comes in, the codebook is then 
searched for the codevector which is the closest match to 
it and an alphabet representing the codevector is 

transmitted. At the receiving end, it only needs to find 
the matched vector which is much easier than at the 

transmitter end. 

During both the codebook generation and the coding 
phases of vector quantization, it is necessary to find a 
"best match" for each vector. This best match should be the 
codevector which most closely approximates the input 
vector, or in other words, yields the lowest distortion. 

In this thesis, the LBG vector quantizer is used. This 
LBG algorithm is simple- yet powerful, and it can be used 
for the generation of a codebook for any vector 
quantization application. The algorithm itself is an 
iterative one, refining the codebook until the distortion 
has reached an acceptable value. 

The distortion is simply the square of the Euclidian 
distance between the two vectors. The overall distortion 
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measure is computed after all training vectors have been 
partitioned. If this distortion falls below the acceptable 
threshold, the iterative process stops, and the current 
codebook is saved. Otherwise, if the distortion is too 
high, each codevector is replaced by the centroid of all 
the training vectors assigned to it. Then the training 
sequence is re-partitioned and the process is repeated. 

3.4 Mixture Block Coding with Progressive Transmission 

Here we investigate the algorithm and property of MBCPT 
to see if it can properly fit into the packet-switching 
environment. 

3.4.1 Progressive Coding 

The technique that allows an initial image to be 
transmitted at a lower bit rate and to be refined with an 
additional bit rate is called progressive coding [12]. 
Consider, for example, an image with size xyz * 256 x 256 x 
8 bits is transmitted. One way to send it is in the zxy 
order: transmit all the eight bits of the first pixel in 
the first row, then stepping along the row (x) for all the 
pixels in that row, advancing down to the following row (y) 
until all the pixels in that image are sent. This is 
probably the simplest and usual way to send an image. 
Another alternative is to go through the xyz order, where 
the most significant bit of every pixel is sent first, then 


19 

the second one and so on to the least significant bit. In 
this way, successive approximations converge to the target 
image with the first approximation carrying the "most" 
information and the following approximations enhancing it. 
The process is like focusing a lens, where the entire image 
is transformed from low-quality into high-quality [13] . 

In progressive coding, every pixel value or the 
information contained in it is possibly coded more than 
once and the total bit rate may increase depending on the 
coding schemes and quality desired. Because only the gross 
features of an image are being coded and transmitted in the 
first pass, the processing time is greatly reduced for the 
first pass and a coarse version of the image can be 
displayed without significant delay. It has been shown that 
it is very useful for perception to get a crude image in a 
short time, rather than waiting a long time to get a clear 
complete image [ 14 ] . 

With different stopping criterion, progressive coding 
is suitable for dynamic channel capacity allocation. If a 
predetermined distortion threshold is met, processing is 
stopped and no more refining action is continued. The 
threshold value can be adjusted according to the traffic 
condition in the channel. Successive approximations (or 
iterations) are sent through the channel in progressive 
coding converging to the desired image. If these successive 
approximations are marked with decreasing priority, then a 
sudden decrease in channel performance may only cause the 
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received image to suffer from quality degradation rather 
than total loss of parts of the images [13]. 

3.4.2 Structure of MBCPT Coder 

Mixture Block Coding (MBC) is a variable-blocksize 
transform coding algorithm which codes the image with 
different blocksizes depending upon the complexity of that 
block area. Low-Complexity areas are coded with large 
blocksize transform coder while high-complexity regions are 
coded with a small blocksize one. The complexity of the 
specific block is determined by the distortion between the 
coded and original image. A more complex image block has 
higher distortion. 

The advantage of using MBC is that it does not process 
different complex regions with the same blocksize. That 
means MBC has the ability to choose a finer or coarser 
coding scheme to deal with different complex parts of the 
same image. For the same coding rate, MBC is able to code 
an image with greater fidelity than a coding scheme which 
codes regions of varying complexity with the same blocksize 
coder. 

When using MBC, the image is divided into maximum 
blocksize blocks. After coding, the distortion between the 
reconstructed and original block is calculated. The block 
being processed is subdivided into smaller blocksize blocks 


21 


if that distortion fails to meet the predetermined 
threshold. The coding-testing procedure continues until the 
distortion is small enough or the smallest blocksize is 
reached. In this scheme, every block is coded until the 
reconstructed image is satisfactory, then the next block is 
coded. 

Mixture Block Coding with progressive transmission 
(MBCPT) is a coding scheme which combines MBC and 
progressive coding. MBCPT is a multipass scheme in which 
each pass deals with different blocksizes. The first pass 
codes the image with a maximum blocksize and transmits it 
immediately . Only those blocks which fail to meet the 
distortion threshold go to the second pass. In the second 
pass the difference image block from the original and coded 
image obtained in the first pass is processed with smaller 
size blocks. The difference image coding scheme continues 
until the final pass which deals with the minimum size 
block. At the receiving end, a crude image is obtained from 
the first pass in a short time and the data from following 
passes serve to enhance it. Fig. 3.1. a shows the structure 
of pass 16x16 for MBCPT. Fig. 3.1.b shows the parallel 
structure of MBCPT. A coding structure like a quad tree is 
proposed by Dreizen[15], and Vaisey and Gersho[16] which 
subdivides those busy blocks into four pieces and will be 
used in this thesis. In the quad tree coding structure of 
this thesis, the 16x16 block is coded and the distortion of 
the block is calculated. If the distortion is greater than 
the predetermined threshold for 16x16 blocks, the block is 
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divided into four 8x8 blocks for additional coding. This; 
coding-checking procedure is continued until the only image 
blocks not meeting the threshold are those of size 2x2. 
Figure 3.2 shows the algorithm. 

3.4.3 Design Consideration 

There are several features which have to be considered 
when designing a MBCPT coder. 

They are: 

(1) the blocksize of the transform coder. 

(2) the bits allocation. 

(3) the quantizer. 

(4) the distortion measurement. 

(5) the threshold value. 

The block size should be small enough for ease of 
processing and storage requirements, but large enough to 
limit the inter-block redundancy [17] . Larger block size 
results in higher image quality, but it is very difficult 
to build real-time hardware for block sizes larger than 
16x16 because the number of calculations increase 
exponentially with block size for the DCT[13]. Besides, if 
the maximum blocksize is set too large, it is destined to 
be subdivided and decreases the efficiency of the coder. 
So, 16x16 is chosen to be the largest blocksize here. 
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The minimum blocksize determines the finest visual 
quality that is achievable in the busy area. If the minimum 
blocksize is too large, it is likely that the blockiness 
will be observed in the coded edge of spherical object 
because the coding block is square. In order to match the 
zonal transform coding used in this thesis, 2x2 is the 
smallest blocksize and there are four passes (16x16, 8x8, 
4x4, 2x2) in this scheme. Fig. 3.3-6 show the images from 4 
passes individually. 

The monochrome images used in this thesis are 
represented with 8 bits of non-negative intensity ranging 
from 0 to 255. After a discrete cosine transformation, only 
four coefficients including the dc and three lowest order 
frequency coefficients are coded and the others are set to 
zero. The dc coefficient in the first pass is coded with an 
8-bit uniform quantizer due to the fact that it closely 
reflects the average gray level for that image block and is 
hard to predict. It is easy to predict the dc coefficient 
in the following pass because it is a residual and has a 
distribution like a Laplacian model. Typically, a 5-bit 
optimal laplacian nonuniform quantizer is used. The three 
ac coefficients, as mentioned above, fit a Laplacian model 
with a variance greater than that of the dc coefficient. 
Because different variances are exhibited for different 
coefficients, the input samples are first normalized so 
that they have unit variance and therefore can be quantized 
with the same 5-bit Laplacian quantizer. As an alternative, 
a LBG vector quantizer with a 512 codebook size is used to 
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quantize the vector which comprises the three ac 
coefficients. Along with the blocksize determined above, 
the maximum and minimum bit rates for this coder ranges 
from 0.09 to 6.65 bits/pel for the scaler quantizer and 
0.07 to 4.66 for the vector quantizer depending upon the 
complexity of the image. 

Any distortion measure can be used in this MBCPT coder. 
It is possible to use different distortion measures for 
each different blocksize pass to adjust for the expected 
radial frequency coding sensitivity of the eye. Each 
different blocksize represents a different spatial 
frequency range that is to be coded, and details of 
distortion induced within each of these blocksizes will be 
seen differently by the eye. In this thesis, the maximum 
absolute difference is used: 

d = maxjjxi-yjj (7) 

where the range of i is taken over the entire block to be 
coded, u is the original image pixel while v is the coded 
image pixel. Because the visual performance mentioned 
above, a luminance to contrast model called logarithmic law 
as follows: 

C = 50 • log 10 f , 1 < f < 100 (8) 
is used to modify the maximum absolute difference law. 
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The threshold of each pass has to be selected before 
the coder is going to work. It is readjustable during the 
operation. If zero is assigned as the threshold for each 
pass, no block is going to satisfy that threshold and the 
maximum data rate is transmitted hoping for a perfect coded 
image. When using an infinite threshold, only the first 
pass data will be sent using the minimum bit rate. Any 
non-negative threshold will fall between these two extreme 
cases and can be adjusted according to the channel 
condition and quality required. 

Because only partial blocks which fail to meet the 
distortion threshold need to be coded, there must be some 
side information to instruct the receiver how to 
reconstruct the original image back. One bit of overhead is 
needed for each block. If a block is to be divided, a 1 is 
assigned to be its overhead; if not, a 0 is assigned. A 
coding process in Fig. 3.7 has the following overhead: 
1 , 1001 , 1001 , 1001 , 1001 , 1001 . 

3.4.4 Distortion and Blocking Effect 

When using MBCPT , there are some types of error that 
can appear in a decoded image. First, high-frequency 
errors, result from eliminating DCT coefficients, using 
zonal masking, and a large thresholds. High-frequency 
errors are characterized by a general blurring of sharp 
edges in the reconstructed image. Another type. 
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quantization error, occurs when DCT coefficients are 
assigned too few bits from the bit assignment map. 
Quantization error is characterized by sinusoidal rippling 
of intensity in the originally solid areas; edges remain 
fairly sharp, but are distorted[8] . 

In MBCPT, the input image is partitioned into a series 
of nonoverlapping rectangular blocks or subimages with 
equal size. Each sub image is a partial scene of the 
original image and is processed independently. In low 
bit-rate application, like the first pass, the block 
boundaries become highly visible and objectionable. Two 
approaches are used in this thesis to eliminate the 
blocking effect. First, because the location of these block 
edges are known exactly in MBCPT, it is reasonable to 
expect that low-pass filtering the image at or near the 
subimage boundaries could smooth the unwanted 
discontinuities. This is the basis of the filtering 
method[18]. A 3 x 3 Gaussian spatial domain filter (Fig. 
3.8) is used. Second, instead of forcing the regions to be 
exclusive of each other, it is reasonable that a slight 
overlap around the perimeter of each region could reduce 
the blocking effect, this is called the overlap method 
(Fig. 3.9) [18]. The pixels at the perimeter would then be 
coded in two or more regions. In reconstructing the image, 
a pixel that was coded more than once would use an average 
of the coded values. 
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Both methods are successful in reducing the blocking 
effect. But the overlap method results in a 13% increase in 
bit rate while the filtering method, due to its low-pass 
nature, may degrade edge content in the image. 

3.4.5 Application in Packet-Switching Network 

Because of the dynamic and adaptive characteristic in 
MBCPT, we can see some interesting features when applied to 
packet video: 

(1) The minimal quality is ensured, (i.e. that of the 
basic channel with higher priority) 

(2) Packet losses on the improvement channel do not 
impair the received signal below the quality offered 
by the basic channel. 

(3) Bandwidth on demand can be easily implemented. 

(4) The scheme is very simple since all complexity is in 
the basic channel codec which operates at low 
frequency. 

(5) An evolutionary transition from todays synchronous 
networks to tomorrow's asynchronous networks becomes 
possible, since the basic channel is implemented now 
and the improvement channel can be added in the 
future on the fast packet network [19]. 

In this chapter, the structure and basic features about 
MBCPT was investigated. From that, it can be seen why this 
algorithm is able to fit into packet-switching environment. 
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Details about the actual implementation in a packet network 
environment will be discussed in Chapter 5. 
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goes down to 
pass 8x8 


Figure 3.1. a Structure of pass 16 x 16 for MBCPT. 

d is distortion defined in Eqs.(7). 
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Figure 3.7 Overhead assignment and zonal coding 
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Chapter 4 Network Simulator 


The network simulator to be used for this thesis is a 
modification of an existing simulator developed by Nelson 
et al.[20]. A brief description of the simulator is 
provided here. 

4.1 Introduction 

As mentioned in Chapter 2, tomorrow's integrated 
telecommunication network will have a very complicated and 
dynamic structure. It's efficient use will require 
sophisticated monitoring and control algorithms. 
Communication between nodes will have to reflect the 
existing capacity and reliability of system components. The 
scheme for communicating information regarding the 
operating status is called the system protocols. 

Since this communication of system information must 
flow through the channel, it reduces the overall capacity 
of the physical layers, but hopefully provides a more 
efficient system overall. Therefore, the optimal system 
efficiency depends a lot upon these protocols, in turn, 
upon the system topology, communication channel properties, 
nodal memory and component reliability. Most network 
protocols have been developed around high reliability in 
topological structures with reasonably high channel 
reliability. 
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The modifications made to this simulator for the 
purpose of this thesis, most are basically in those modules 
concerning network layers. This simulator is structured in 
modules which represent, to some degree, the ISO Model for 
packet switched networks. Therefore, a more detailed 
description about the network layer modules will be made in 
the next section. In this chapter, a whole picture for this 
simulator will be provided. 

4.1.1 Topology, Traffic and Preparation 

The program Topology is used to generate a topological 
description of the network to be simulated. It contains the 
number of nodes, the definition ( includes connectivity and 
propagation delay) of the links between nodes, and the 
initial bit error rate for each link. 

The program Traffic is used to generate an initial 
statistical description of the network traffic to be 
simulated. It contains the average message length for each 
precedence, the percent of messages generated for each 
precedence level, the rate of message generation at each 
node and the distribution of those messages to the other 
nodes. 

The program Simprep is used to generate a checkpoint 
file which contains all the data needed for the simulation 
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including the topology, traffic and network parameters for 
various network layers. 

4.1.2 Simulator Philosophy 

The principle function of the simulator is to perform 
tasks at the appropriate time. A queue called SIM_Q drives 
the simulator. The records in SIM_Q contain: 

1. The task to be performed. 

2. The time at which the task will be completed. 

3. Node 1 (sender). 

4. Node 2 (receiver). 

5. Line (channel line routed) . 

6. The message number and the packet number. 

7. A pointer to a packet (if one is involved). 

8. Queue pointers for a doubly linked list. 

The main simulator program has the popping of SIM_Q and 
the execution of routines which effect the completion of 
the scheduled task contained in the popped record. These 
completion routines simulate the completion of the task and 
may result in other completion tasks to be performed in the 
same layer or other layers. A new task will be queued in 
the appropriate queue. If it is for another layer, then if 
the processor for that layer is idle it will invoke the 
scheduler for that layer. 


4.1.3 Simulator Queue and Queue Processors 
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Central to the operations of the simulator are the 
various queues. There are two types of records which are 
entered into queues. One is the S im_Q_Record which contains 
the information required to perform a task and the other is 
the Packet_ Record that contains the information regarding 
the contents and status of a packet. The main program SIMEX 
works directly from SIM_Q, which is the queue of 
Sim_Q_Records. There is only one such queue for the entire 
simulator but there exists many packet queues. There are 
three kinds of packet queues which are referred as 
Memory_Q, Packet_Q and Cleanup_Q in the simulator. 

The Message_Q contains all packets originating at this 
node and the Transfer_Q contains all packets received from 
other nodes fall into the group of Memory_Q. The Packet_Q 
is used to simulate the nodal queues in which the packets 
reside as they progress through the various network layers. 
These queues are mutually exclusive in that a packet can 
only reside in one of these queues at any given time. The 
transport_Q is for those packets waiting for packetization 
or reassembly in the transport layer. In the case of a 
packet waiting for routing, it is placed in the Input_Q in 
the network layer. If a packet is heading for other nodes, 
it is placed in the Output_Q waiting for transmission by 
datalink layer. 


If piggy-back acknowledgement is allowed, then it is 
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possible that a packet's address from the sending node must 
be stored for a period of tine before the opportunity 
exists to return the address in an acknowledgement. In the 
simulator, this is accomplished through the Cleanup_Q. 

4.2 The Network Layers 

Each layer of the simulator module contains a processor 
and one or more packet queues. The processor is idle before 
there is a packet coming into its associated queue. The 
packet and the task that must be performed are entered into 
SIM_Q with a completion time. When the task is performed, 
that means the completion time has arrived, then the queue 
is checked. If there is another task to be performed, then 
its completion is scheduled. If the queue is empty the 
processor is marked idle again. 

The layers in the simulator are quite close in 
operation to the ISO transport, network and datalink 
layers. A "partial" session layer exists principally as a 
reporting layer for end to end statistics. 

4.2.1 The Session Layer 

In the OS I model, the session layer allows users on 
different machines to establish "sessions" between them. In 
the simulator, as mentioned above, it is a relatively 
simple model of the subscribers and an end to end 
statistics collector. At message arrival time, the session 
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layer generates the message with all of its randomly 
selected attributes and if flow control or node hold-down 
are not in effect, submits it to the transport layer and 
then builds up the next message arrival time. 

During initialization, a task • , SL_Rcv_Msg" for each 
node is queued in SIM^Q for the arrival time of the first 
message at that node. When this task is executed by the 
simulator, a message packet is generated and placed in the 
transport queue. The arrival of the next message is then 
queued in SIM_Q with the same task and an arrival time 
determined by the random number generator (Poisson 
distribution) . 

The only other task performed at the session layer is 
the SL_Snd_Msg task which simulates the delivery to the 
subscriber. In the simulator, this is principally a 
"bookkeeping task" that records message statistics and 
"cleans up" the queues containing packets with resolved 
references. 

4.2.2 The Transport Layer 

The basic function of the transport layer is to 
receive the message from the session layer, separate it up 
to smaller units if necessary, pass these to the network 
layer and make sure these pieces will arrive sequentially 
at the other end. Furthermore, all this work is expected to 
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be done efficiently, and in a way that isolates the session 
layer from future progress in hardware technology. 

In the simulator, the transport layer simulates 
packetization, reassembly, message acknowledgement and 
resubmittal in the case that a message acknowledgement is 
not received in time, transport-layer time-out. There are 
four tasks simulated by the transport layer. They are 
TL (Transport Layer) _Packetize, TL_Timeout, TL_Reassemble, 
and TL_Ack_Send. It is recognized that in some networks, 
packetization takes place at the network level, leaving the 
transport layer responsible only for message level 
structures. Reassembly, depending upon the protocol can 
take place as low as the datalink level. These tasks were 
both placed in the transport layer for ease of coding, but 
are separate modules that could be quite easily extracted 
and placed elsewhere. Also, the system was originally 
designed for datagram operation and since the packets will 
not necessarily arrive in order, it is unlikely that 
assembly would take place at the datalink level. 

4.2.3 The Network Layer 

The network layer is concerned with controlling the 
operation of the network. A key design issue is determining 
how packets are routed from source to destination. Another 
issue is how to avoid the congestion caused if too many 
packets are presented into the network at the same time. In 
the simulator, the network layer performs all of the 
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functions related to these two aspects with the exception 
flow control which takes place at the session layer, and 
the recovery protocols which require some service from the 
datalink layer. It also activates new channels when needed 
and determines when packets originating at other nodes are 
to be discarded. 

The network layer is currently the most dynamic with 
regard to the coding of modules. Five modules currently 
comprise the network layer. These include relatively static 
modules; one module for dialing up new lines when more line 
capacity is required and releasing them when not needed; 
one module for the network processor and queue handling and 
one module for the routines which are common to most 
routing algorithms. This leaves two modules for the dynamic 
parts of the routing and flow control algorithms. 

4.2.4 The Datalink Layer 

The main task of the datalink layer is to take a raw 
transmission facility and transform it into a line that 
.appears free of transmission errors to the network layer. 
It simulates the sending of the message over the channel 
and the delivery at the other end. When a packet is 
received, the datalink acknowledgement is initiated either 
by piggy-back acknowledgement or by generating a datalink 
acknowledgement packet. 
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As mentioned previously, the datalink level also 
simulates the physical layer on a statistical basis. If 
correct transmission was indicated (through a random number 
generator) then acknowledgement was also assumed. Current 
datalink layer simulation modules include generation of 
acknowledgement packets and simulation of the piggy-back 
acknowledgement as well. When a line is ’’brought up", 
health packets are used to establish initial connections. 
Also, when a line "goes down", an active node will 
immediately issue health check packets to ascertain when 
the channel is again available. 

4.3 Modifications 

A major problem of using this system as a simulation 
tool for the study of packet video is that the system 
doesn't actually transmit the data from node to node. While 
a packet is transmitted, the data field is empty. Therefore 
modifications had to be made to the simulator to 
accommodate the video data. 

In the sending node, a field called "Image" which 
contains real image data is attached to the record 
"Packet_Ptr" allocated to the message generated in the 
session layer. There are three new modules in this layer. 
First, "Get_Image" puts the image data into the image field 
of a message generated at a specific time and node. Second, 
"Image_Available" checks if there is still any image data 
that needs to be transmitted. If that is true, the 
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following message generated at that specific node is still 
the image message and contains some image data. Third, 
"Receive_Image" collects the image data in the session 
layer of the receiving node when the flag "Image_Complete" 
is on. In module "Session_Msg_Arrive", different priority 
is assigned to different messages. In module "Session_Msg_ 
Send 1 ' , some statistics are calculated including the number 
of lost image packets and the transmission delay for image 
packets. 

In the original version of the simulator, the transport 
layer simply duplicated the same packet with different 
assigned sequential packet numbers without actually 
packet izing the message. The module "Transport__Packetize" 
was modified to really packetize the image data which 
resides in the message record queued in Transport__Q when it 
is called. The module "Transpor^Reassemble" is called to 
reassemble these image packets according to their packet 
number when the flag "Image_Content" defined in Packet_Ptr 
is true. 

The network layer is responsible for routing and 
flow-control. This module is already very well developed, 
so the modifications to be performed here were relatively 
minor. 

In the datalink layer, in order to simulate the 
delivery of packets through the channel, a new packet will 
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be generated at the receiving node and the information 
including the image data from the transmitted packet (which 
will still be resident at the sending node) will be copied 
into it. With the bit-error-rate defined in the program 
Topology, transmission success rate will be set and bit 
errors can be inserted in both the data and control bits in 
the packet. Errors in the control bits are simulated 
separately as long as the error rates are consistent. If an 
error in the control bits occurs, the transmission fails 
and needs to be sent again depending on the threshold of 
the timeout number. 

Besides the modifications made in those layer modules, 
we still have to arrange some new memory elements allocated 
for image messages and packets. In order to make sure the 
simulation is run in the steady state, image data is 
available after some simulation time. 

In the next chapter, the interaction between this 
simulator and the coding scheme investigated in the 
previous chapter will be presented. 


Chapter 5 Simulation Results 


In this chapter, an inter frame coder based on MBCPT 
will be introduced and the simulation results will be 
discussed. 

5.1 Differential Interframe Coding 

Teleconferencing, picturephones, and broadcast videos 
are all transmitted as sequences of two dimensional images. 
An interframe coder is used to exploit the redundancy 
between the successive frames. The differences between 
frames basically come from object and camera motion. 

During this work we examined two interframe coders, 
both of which are differential schemes based on MBCPT. Both 
coders processes the difference image coming from the 
current frame and a reconstruction of the previous frame. 
In the first case the reconstruction is locally decoded 
from the first three passes. This algorithm is shown in 
Fig. 5.1. Fig. 5.2 shows the second case in which the 
reconstruction is performed with all four passes. To select 
the preferred algorithm, we compare the simulation results 
from these two approaches. When there is no packet loss, 
the performances of both schemes are the same (from Fig. 
5.3). But when congestion occurs in the network, given the 
priorities assigned to packets, packets from pass 4 are 
expected to be discarded first. In this case, the 
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performance (from Fig. 5.4) of scheme in Fig. 5.1 is much 
better than the one in Fig. 5.2. Based on this result, the 
coding scheme in Fig. 5.1 was selected in our simulation. 

In this thesis, the Kronkite motion picture with 16 frames 
is used as the simulation source. Every image consists of 
256x 256 pixels with gray levels ranging from 0 to 255. The 
test sequence is similar to a video conferencing type image 
sequence which has neither rapid motion nor scene changes. 
Due to this characteristic, advanced techniques like motion 
detection or motion compensation are not used here but 
could be implemented when dealing with broadcast video. 

From the values listed in the Table 5.1, we can see 
that the data in pass 4 represents 30-40% of the entire 
data. Based on perceptual evaluation, pass 4 is deemed the 
least significant pass(LSP). This part of the data is going 
to increase the sharpness of the image and is usually 
labeled with the lowest priority in the network. With a 
substantial possibility of being discarded due to low 
priority, the packets from pass 4 will not be used to 
reconstruct the locally decoded image and will not be 
stored in the frame memory. This prevents the packet loss 
error due to loss of packets from pass 4 from propagating 
into the following frames. It is found through simulation 
that this approach increases the peak signal-to-noise ratio 
(PSNR) by 1-2 dB under conditions of packet loss. 
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5.2 Interaction of the Coder and the Network 

When the video data is packed and sent into a nonideal 
network, some problems emerge. These are discussed in the 
following section. 

5.2.1 Packetization 

The task of the packetizer is to assemble video 
information, coding mode information if it exists, and 
synchronization information into transmission cells. In 
order to prevent the propagation of the error resulting 
from packet loss, packets are made independent of each 
other and data from the same block or the same frame is not 
separated into different packets. The segmentation process 
in the transport layer has no information regarding the 
video format. To avoid having the bit stream being cut 
randomly, the packetization process has to be integrated 
with the encoder which is in the presentation layer of 
user's premise. Otherwise, some overhead has to be added 
into the datastream to guide the transport layer in doing 
the correct packetization. In order to limit the delay of 
packetization, it is necessary to stuff the last cell of a 
packet video with dummy bits if the cell is not completely 
full. 

Every packet must contain an absolute address which 
indicates the location of the first block it carries. 
Because every block in MBCPT has the same number of bits in 
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each pass, there is no need to indicate the relative 
address of the following blocks contained in the sane 
packet. There always exists a tradeoff between packaging 
efficiency and error resilience. If error resilience is 
considered, each packet should contain a smaller number of 
blocks. However, since each channel access by a station 
contains an amount of; overhead, the packet should be long 
for transmission efficiency. Fixed length packetization is 
used in this thesis for simplicity. 

5.2.2 Error Recovery 

There is no way to guarantee that packets will not get 
lost after being sent into the network. Packet loss can be 
attributed to two main problems. First, bit errors can 
occur in the address field, leading the packets astray in 
the network. Second, congestion can exceed the networks 
management ability and packets are discarded due to buffer 
overflow. Effects created by higher pass packet (like pass 
4) loss in MBCPT coding will be masked by the basic passes 
and replaced with zeros. The distortion is almost invisible 
when viewing at video rates because the lost area is 
scattered spatially and over time. However, low pass packet 
(like pass 1) loss, though rare due to high priority, will 
create erasure effects due to packetization and be very 
objectionable. 


Considering the tight time constraint, retransmission 
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is not feasible in packet video. It nay also result in nore 
severe congestion. Thus, error recovery has to be performed 
by the decoder alone. In our differential MBCPT scheme, the 
packets from pass 4 are labeled lowest priority and form a 
great part of the complete data. These packets can be 
discarded whenever network congestion occurs. That will 
relieve the network congestion and will not cause too much 
quality degradation. The erasures caused by basic pass loss 
is simply covered with the reconstructed values from the 
corresponding area in the previous frame. This remedy 
appears insufficient even when there is only a small amount 
of motion in that area. Motion detection and motion 
compensation could be used to find a best matched area in 
the previous frame for replacement. 

Side information in the MBCPT decoding scheme is very 
important. So, this vital information is not allowed to get 
lost. Two methods can be used for protection. First, error 
control coding, like block codes or convolutional codes, 
can be applied in both directions along with and 
perpendicular to the packetization. The former is for bit 
error in the data field while the latter is for packet 
loss. Fig. 5.5 demonstrates the second case. The minimum 
distance that the error control coding should provide 
depends on the network's probability of packet loss, 
correlation of such loss and channel bit error rate. 
Second, from Table 5.1, we can see that the output rate of 
side information and pass 1 and even pass 2 is quite 
steady. It seems feasible to allocate an amount of channel 
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capacity to these outputs to ensure their timely arrival. 
That means circuit switching can be used for important and 
steady data. 

5.2.3 Flow Control 

In order to shield the viewer from severe network 
congestion, there are some flow control schemes which are 
considered useful. If there is an interaction between the 
encoder and the transport layer, then the encoder can be 
informed about the network condition. Depending on that, 
the encoder can adjust its coding scheme. In the MBCPT 
coding scheme, if the buffer is getting full, it means that 
the bit generating rate is overwhelming the packetization 
rate and the encoder will switch to a coarse quantizer with 
fewer steps or loosen the threshold to decrease its output 
rate. In this way, smooth quality degradation is 
obtainable. This will also complicate the encoder design. 

It is possible to use the congestion control of the 
network protocols to prevent the drastic quality change by 
assigning different priorities to packets from different 
passes. Without identifying the importance of each packet 
and discarding packets blindly sometimes brings disaster 
and cause session shut down, for example if the side 
information gets lost. In the MBCPT coding scheme, side 
information and packets from pass 1 are assigned with 
highest priority and higher pass packets are assigned with 
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decreasing priority. 

5.2.4 Resynchronization 

Because of the lack of a common clock between 
transmitter and receiver and the variable packet generating 
rate used in packet video, resynchronization is an inherent 
problem in packet transmission. Transmission delay is 
irrelevant for one-way sessions and resynchronization can 
be solved by buffering the received packets in the receiver 
for a duration equal to L units from the start of 
transmission before transferring to the decoder. That means 
there is a constant lag of L units between the encoder and 
decoder. A packet loss occurs when any packet can not 
arrive in the limited time. 

Although transmission delay is tolerable in one-way 
transmission, it becomes critical in two-way sessions 
because long delays impede information exchange. There are 
three methods which can be employed to accomplish the 
resynchronization task. The first approach is to modify the 
phase between the sending and receiving clocks by skipping 
or repeating video frames. The second scheme is to approach 
the transmitting frequency by the time stamps carried in 
the packet. Noted that this scheme can not be adopted by a 
multidrop decoder because it receives signals from more 
than one source. The third method is to adjust the 
receiving clock with a phase-locked loop by observing the 
level of the input buffer at the receiving end. 
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5.2.5 interaction with protocols 

In the ISO model, physical, datalink and network 

\ 

layers comprise the lower layers which form a network node. 
The higher layers have transport, session, presentation and 
application layers and typically reside in the customer's 
premises. 

The lower layers have nothing to do with signal 
processing and only work as a "packet pipe". The physical 
layer requires adequate capacity and low bit-error-rate 
which are determined only by the technology. The datalink 
layer can only deal with link-management because all the 
mechanics like requesting retransmission is not feasible in 
packet video transmission. The network layer has to 
maintain orderly transmission by deleting the delay jitter 
with input buffering. Otherwise, it can take care the 
network congestion by assigning transmission priority. 

As the higher layers reside in the customer's 
premises, they perform all the functions of the packet 
video coder. The transport layer does the packetization and 
reassembly. The packet length can be fixed or variable. 
Fixed packet length simplifies segmentation and packet 
handling while a variable packet length can keep the 
packetization delay constant. The session layer supervises 
set-up and tear-down for sessions which have different 
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types and qualities. There is always a tradeoff between 
quality and cost. The quality of a set-up session can be 
determined by the threshold in the coding scheme and the 
priority assignment for transmission. Of course, the better 
the quality, the higher the cost. Fig. 5.6 shows the 
tradeoff between PSNR and video output rate by adjusting 
thresholds. The presentation layer does most of the signal 
processing, including separation and compression. Because 
it knows the video format exactly, if any error concealment 
is required, it will be performed here. The application 
layer works as a boundary between the user and the network 
and deals with all the analog-digital signal conversion. 

5.3 Results from Packet Video Simulation 

Results obtained in this packet video simulation show 
that a pretty high compression and associated image quality 
can be obtained using this differential MBCPT scheme. 

The monochrome sequence used in this simulation 
contains 16 frames, each of size 256x256 pixels with 8 bits 
per pixel, corresponding to a bit rate of 15.3 Mbits/s, 
given a video rate of 30 frames/s. As Table 5.2 shows, the 
average data rates of our system is 1.539 Mbits/s. The 
compression rate is about 10 with a mean PSNR equals 38.74 
dB as calculated from 


256 2 


PSNR = 10log 10 ( 


a2 diff 


) 


( 9 ) 
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where 256 is the peak intensity of the image pixel and 
a2 diff variance of the difference between original 
and reconstructed frames. Fig. 5.7 shows the data rate of 
the sequence frames with side information, 4 passes and the 
total rate. It is clear that the data rate of pass l is 
constant as long as the quantization mode remains the same. 

Side information and data from pass 2, and even pass 3, 
is quite steady and is referred as Most Significant Pass 
(MSP) . The data rate of pass 4 is bursty, 
highly-uncorrelated and is called Least Significant Pass 
(LSP) . Fig. 5.8 shows the PSNR for each frame in the 
sequence. The standard deviation is only 0.2 dB. In the 
simulation, the same threshold is used throughout the 
sequence. If constant visual quality is desired, a varying 
threshold can be used for different frames. That will 
generate a much more varying bit rate, and of course motion 
detection would be required. Comparing these two figures, 
it seems true that a varying bit rate can support constant 
quality video. 

From the difference images of this sequence, frames 1-8 
(Fig. 5.9-11) seem quite motionless while frames 9-13 (Fig. 
5.12-14) contain substantial motion. We adjust the traffic 
condition of the network to force some of the packets to 
get lost in order to check the robustness of the coding 


scheme. 
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Transmission delay is not considered in this simulation 
because it is not the main interest. Heavy traffic is set 
up in the motionless and motion period separately. The 
average packet loss percentage is 3.3% which is considered 
high for most networks. Fig. 5.15-16 show the images which 
suffer the packet loss from pass 4. As can be seen, the 
effect of lost packets is not at all severe, even if the 
lost packet rate is unrealistically high. This is because 
the performance from the first three pass is relatively 
good. Fig. 5.17-18 show the case when packet loss occurs in 
pass 1. Clearly there are visible defects in the motion 
period. What's worse is that the error will propagate to 
the following frames. Apparently, the replenishing scheme 
used here is not sufficient in areas with motion. It is 
believed that this inconsistency can be eliminated with a 
motion compensation algorithm, which would find the 
appropriate area for replenishment, and with error 
concealment, which limits the propagation of error. 

















Figure 5.3 Performances without packet loss 
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Packet length 


k packets 

1001010101101010001110111010101 

0101001010101010000101010100101 


loiooioiooooioiooioooioiooiooii 

Error Control Coding 

k packets 

1001010101101010001110111010101 

0101001010101010000101010100101 


loiooioiooooioiooioooioiooiooii 

packets 
with — 1 
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parity bits 


n 

packets 


Figure 5.5 Error control coding applied perpenticular 
to the direction of packetization. 
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FRAME OVER- 


HEAD 

1 

2588 

2 

1772 

3 

2156 

4 

2088 

5 

2164 

6 

1988 

7 

2352 

8 

2432 

9 

2316 

10 

2568 

11 

1892 

12 

2352 

13 

1968 

14 

2468 

15 

2216 

16 

1496 

TOTAL 

34816 

MEAN 

2176 

DEVIATION 

290 


PASS1 

PASS 2 

4352 

8400 

4352 

5992 

4352 

7168 

4352 

6888 

4352 

7112 

4352 

6328 

4352 

7448 

4352 

7952 

4352 

7504 

4352 

7840 

4352 

6048 

4352 

7616 

4352 

6384 

4352 

7840 

4352 

9352 

4352 

4536 

69632 

114408 

4352 

7150 

0 

1094 


PASS 3 

PASS4 

24248 

24416 

15232 

11312 

19432 

20104 

18760 

13216 

19600 

17416 

17920 

14336 

21896 

22736 

22512 

25704 

21336 

24136 

24528 

26992 

16856 

11144 

21728 

18200 

17584 

15008 

23128 

26936 

18088 

728 

12824 

12936 

315672 

287392 

19729 

17962 

3179 

7000 


Table 5.1 Output bit rate for each and total 
The unit is bits. 


TOTAL 


64004 

38660 

53212 

45304 

50644 

44924 

58784 

62952 

59644 

66360 

40292 

54248 

46296 

64734 

34736 

36164 

820992 

51312 

10395 


pass. 
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OVER- 

HEAD 

PAS SI 

PASS 2 

PASS 3 

PASS 4 

TOTAL 

MEAN 

65.28 

130.56 

214.50 

591.87 

538.86 

1539.36 

DEVIATION 

8.70 

0.00 

32.82 

95.37 

210.00 

311.85 

MAXIMUM 

77.04 

130.56 

280.56 

735.84 

821.52 

1990.80 

MINIMUM 

44.88 

130.56 

136.08 

384.72 

21.84 

1042.08 


Table 5.2 Output bit rate for each and total pass 
calculated with 30 frames/sec video rate. The maximum 
and minimun values are the instantaneous rates, which 
correspond to the respective maximun and minimum 
number of bits needed to encode a particular frame 
in the sequence. The unit is kilobits. 
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Figure 5.18 The effect of pass 1 packet 


loss for frame 9. 
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Chapter 6 Conclusions 


In Chapter 1 and 2 the environment of the future ' s 
telecommunications is described and some specifications for 
integrating signal processing into this environment are 
proposed. In Chapter 3 the basic materials of data 
compression were introduced and the characteristics of 
MBCPT were investigated. In Chapter 4 a view of the network 
simulator used for these tests was provided and the 
modifications required were discussed. Finally, in Chapter 
5 the differential scheme of MBCPT as a packet video coder 
was proposed and its performance was examined. 

The network simulator was used only as a channel in 
this simulation. In fact, before the real-time processor is 
built, a lot of statistics can be collected from the 
network simulator to improve upon the coding scheme. These 
include transmission delays and losses from various passes 
under different network loads. For resynchronization, the 
delay jitter between received packets can also be estimated 
from this simulation. 

The environment for tomorrow's telecommunications has 
been described and requires a flexibility which is not 
possible in circuit switching network. MBCPT has several 
appealing properties such as high compression rate with 
good visual performance, robustness to packet lost, 
tractable integration with network mechanics and simplicity 
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in parallel implementation. Some more considerations have 
been proposed for the whole packet video system like 
designing protocols, packetization, error recovery and 
resynchronization. For fast moving scenes, the differential 
MBCPT scheme seems insufficient. Motion compensation, error 
concealment or even attaching function commands into the 
coding scheme are Relieved to be useful tools for 
increasing the performance and will be the direction of 
future research. 
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